The genetic structure and demographic history of Zabelia tyaihyonii, endemic to Korean limestone karst forests, based on genome‐wide SNP markers

Abstract Similar to the global phenomenon, many plant species endemic to Korean limestone karst forests are at risk of extinction due to human intervention. Zabelia tyaihyonii is a familiar shrub, called “Hardy abelia” and “Fragrant abelia” growing in the karst forests of Korea, where it is one of the most threatened species. We investigated the genetic structure and demographic history of Z. tyaihyonii, which allow us to develop appropriate conservation and management strategies. The genetic structure was evaluated using a total of 187 samples from 14 populations, covering the entire distribution of Z. tyaihyonii in South Korea. We utilized 254 and 1753 SNP loci obtained via MIG‐seq (Multiplexed ISSR Genotyping by sequencing) for structure and demographic analyses, respectively. The population demographic modeling was performed with site frequency spectrum. To gain further historical insights, we also employed ENM (Ecological Niche Modeling). We found two distinct clusters (CLI and CLII) of ancient origin (ca. 490 ka). Despite CLII experiencing a more severe bottleneck, both clusters showed similar levels of genetic diversity, indicating mutual historical gene flow. Their historical distribution range seems to have changed very little. We proposed a historical distribution scenario for Z. tyaihyonii, taking into account its intrinsic factors, and emphasized a more complex response to Quaternary climate change beyond simple allopatric speciation models. These findings provide valuable insights for conservation and management strategies for Z. tyaihyonii.


| INTRODUC TI ON
The population genetic structure for a given plant species could be influenced by local differences in various extrinsic factors, such as availability of pollinators , seed dispersers (Calviño-Cancela et al., 2012), soil components (Nagasawa et al., 2020;Yamamoto et al., 2020), and climate conditions (Bisbing et al., 2021;Galliart et al., 2019). The degree of gene flow among populations is also considered a primary factor in shaping the population genetic structure (Tamaki et al., 2021). Although the relative importance of each factor may spatiotemporally change, the patterns of genetic differentiation observed in the majority of plants reflect historical, rather than current factors, particularly in East Asia (e.g., Cho et al., 2020;Han et al., 2020;Kikuchi & Osone, 2021). The population establishment history with demographic events is also associated with the Quaternary climatic oscillations that periodically affected both habitat fragmentation and connectivity, as well as geographical distribution range (Chung et al., 2017;Hewitt, 2000Hewitt, , 2003. The determinants of genetic variation obtained by this process may play a key role in the persistence of a species, by enabling the populations to adapt to rapidly changing environments (Hohenlohe et al., 2021;Médail & Baumel, 2018). Therefore, revealing the genetic structure of plant species and identifying the adaptive significance of specific populations can be the basis for establishing effective conservation strategies.
To minimize the extinction risks of populations, there are a growing number of population-scale conservation studies that infer their evolutionary histories. With advances in next-generation sequencing technology, various genetic frameworks have been applied to identify genetic information of a species underlying conservation managements (e.g., GBS, Lee et al., 2022;RAD-seq, Stojanova et al., 2020). MIG-seq (Multiplexed ISSR Genotyping by sequencing) is one of the various genotyping by sequencing (GBS) methods and can be an ideal technique for producing large datasets costeffectively at the population level (Suyama & Matsuki, 2015). It is advantageous when inferring past demographic events that might leave strong genetic footprints (Ishii et al., 2022;Yoichi et al., 2021).
Zabelia tyaihyonii (Nakai) Hisauti & H. Hara is a temperate deciduous, broad-leaved shrub in the family Caprifoliaceae that is endemic to the dry karst forests on the Korean Peninsula (Jeong et al., 2007;Figure 1). This species is known to have adaptively evolved on calcareous soils . Z. tyaihyonii has potential value for landscaping and gardening because of its colorful flowers (pale pink), excellent fragrance, and strong cold tolerance syn. Abelia mosanensis T.H. Chung ex Nakai, see Rose, 2013), and has been traded under the names "Hardy abelia" and "Fragrant abelia", since its introduction in the United States in 1989 Sim & Seo, 1995). In South Korea, the Korea Forest Service has listed Z. tyaihyonii as an endangered species due to its extreme rarity and limited distribution in the karst forests (Korea National Arboretum, 2012). Nevertheless, concerns about their habitat destruction due to the mining industry are constantly emerging, as well as the over-collecting of specimens for commercial purposes (Chae et al., 2023;Jeong et al., 2007;Kim et al., 2022).
The Korean limestone karsts are disjunctively separated on a smaller scale by non-calcareous substrata, forming two zones, one each in South Korea and North Korea. Although Z. tyaihyonii is considered an endangered species in North Korea (Ju et al., 2016), little is known about its current status, and it is most likely not being protected in any way in North Korea. The karst forests in South Korea are home to many species, and about 100 vascular plant species grow preferentially in the limestone forests (Chung et al., 2013).
In a given region, the accumulating genetic information for plant species can prove useful in setting conservation priorities (Chung et al., 2017(Chung et al., , 2018 (Nakai) Nakai: Chung et al., 2013;Saussurea chabyoungsanica Im: Jeong et al., 2012]. Also, previous studies of Z. tyaihyonii have sampled a very narrow range, and lacked historical inferences (Jeong et al., 2007;Kang et al., 2023). Therefore, our insight into its effective conservation is very limited.
A number of studies have been conducted to explore the effects of changes in geographical distribution due to Quaternary climatic fluctuations on the genetic diversity of temperate organisms, usually warm-adapted (Petit et al., 2008). As a result, the focus of these studies has predominantly been on the latitudinal movements of warm-adapted, species (e.g., Jin et al., 2021;Lee et al., 2013). The limited perspective has hindered our understanding of the diverse behavioral patterns exhibited by various species. To broaden our knowledge, it is essential to investigate the potential vertical movement and other factors of cold-tolerant temperate plant species within specific areas . Therefore, it may be important to consider what historical distributional changes may have occurred for our target species that have evolved to adapt to limited limestone zones.
Ecological niche modeling (ENM) provides a powerful framework for investigating the interplay between ecological and evolutionary factors that have contributed to the genetic diversity of populations (Alvarado-Serrano & Knowles, 2014). This approach is particularly useful in understanding the distribution and genetic structure of populations during Quaternary glacial cycles, even at small spatial scales Jin et al., 2021). To shed light on the historical migration patterns of Z. tyaihyonii, we utilized ENM and population modeling techniques, including site frequency spectrum analysis. This will allow us to gain a better understanding of the factors that have shaped the genetic makeup of this species.
In this study, we aimed to explore the evolutionary history and demographic dynamics for populations of Z. tyaihyonii in South Korean karst forests. The specific goals of the present study are (1) to investigate their genetic diversity and structure, (2) to evaluate the evolutionary potential and conservation value to the populations from inferring the demographic history, and (3) to provide effective conservation guidelines for the recovery and management from a long-term conservation genetics perspective.

| Study species and population sampling
Lasting problems have arisen with the use of the scientific name for this distinct and apparent taxon, Z. tyaihyonii. Controversy exists over whether the taxa distributed in the Korean Peninsula should be treated as the genus Zabelia (Rehder) Makino or Abelia R. Br. (Chung & Sun, 1984;Paik & Lee, 1989). However, a recent phylogenetic study suggested that Zabelia is a monophyletic group not closely related to Abelia (Jacobs et al., 2010). This is one of the reasons we recognize target species as Z. tyaihyonii. Nakai (Nakai, 1921(Nakai, , 1926. Zabelia mosanensis was described as distinct from Z. tyaihyonii by the features of having larger flowers (corolla and calyx) with hairs on the filaments. However, distinguishing these two taxa is inherently difficult because the reference specimens of Z. tyaihyonii were collected during the fruiting season (Sun, 1999). Moreover, Z. tyaihyonii also have hairs on their filaments . There is no evidence that Z. mosanensis has been found anywhere other than in Z. tyaihyonii's habitats. Consequently, Z. mosanensis is regarded as a synonym of Z. tyaihyonii (Sun, 1999), that view is well maintained among Korean researchers (Chae et al., 2023;Kang et al., 2023;Won & Kim, 2007).

| MIG-seq analysis
To obtain genome-wide single-nucleotide polymorphism (SNP) data, we used the MIG-seq method. Preparation of the MIG-seq library was conducted under standard conditions, according to Suyama et al. (2022), without a dark cycle. The first PCR was performed using the MIG-seq primer set-1 to amplify ISSR regions from genomic DNA. The first PCR products were identified by visualization on 0.8% agarose gels and then used as templates for the second PCR (tailed PCR). The second PCR was conducted to add individual indices to each sample using the indexed forward and reverse primers. Each second PCR product was pooled as a single mixture library, and the mixture was purified with a QIAquick PCR purification Kit (Qiagen). The purified mixture was size-selected in the range of 350-650 bp using the BluePippin system (Sage Science). We sequenced a dual-indexed library containing all samples using Illumina HiSeq X at Macrogen, and 151 bp paired-end reads were obtained.
The MIG-seq raw reads were grouped into those from individual samples using the index read option of the sequencer.

| SNP detection
For quality control of the raw reads, the first 17 bp, including partial primer sequences at the 5′ end of each raw read, were trimmed (Nishimura et al., 2022) using the program fastx_trimmer in the FASTX-Toolkit 0.0.14 (http://hanno nlab.cshl.edu/fastx_toolk it/) so that the length of each read became 134 bp. The low-quality reads were removed using the program quality_filter in the FASTX-Toolkit, and extremely short sequence reads were then removed by TagDust 1.12 (Lassmann et al., 2009). The quality-filtered reads were analyzed with Stacks v.2.5 (Catchen et al., 2013) to detect SNPs. The stacks in each sample were built by de novo assembly with the option of minimum depth coverage required to create a stack (m) = 3 and maximum difference in nucleotides allowed between stacks (M) = 2.
The SNP dataset was created using the 'populations' command. The minimum number of populations at a locus (-p) was set to 14, the total number of populations. We also included only the first SNP per locus (--write-single-snp) to avoid bias towards specific stacks, and set the minimum minor allele frequency to 0.03 (--min-maf 0.03) and maximum observed heterozygosity to 0.60 (--max-obs-het 0.60).
The polymorphic loci values for the Stacks were determined under percent samples limit per population (-r) = 1.00, 0.90, 0.85, and 0.80, respectively. The optimal values of SNP dataset were determined when the number of polymorphic loci appeared to be maximized or stabilized around the maximum.

| Analysis of population genetics statistics and population structure
Population genetics statistics were estimated based on MIG-seq SNP data using GenAlEx version 6.502 (Smouse & Peakall, 2012).
These included the number of private alleles (P A ), observed heterozygosity (H O ), expected heterozygosity (H E ), and inbreeding coefficient (F IS ). Nucleotide diversity (π) was calculated using the 'populations' command of Stacks. Allele richness (A R ) and genetic differentiation among populations were determined according to the method of Weir and Cockerham (1984), using FSTAT version 2.9.4 (Goudet, 1995).
To infer the population structure of Z. tyaihyonii, we used a Bayesian clustering approach implemented in STRUCTURE version 2.3.4 (Pritchard et al., 2000) using 100,000 Markov Chain Monte Carlo (MCMC) iterations (10,000 burn-in, with admixture). The simulation used 20 iterations, with K = 1-14 clusters. We determined the optimal number of clusters, K value, using STRUCTURE HARVESTER (Earl & VonHoldt, 2012). CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) with the Greedy algorithm was used to combine the membership coefficient matrices (Q-matrices) from 1000 iterations using random input orders. We also conducted a principal coordinate analysis (PCoA) to understand the genetic relationships among individuals and among populations using the covariance standardized approach of pairwise Nei's genetic distances in GenAlEx.
To visualize relationships among populations, we constructed a split network using SplitsTree4 version 4.14.5 with the Neighbor-Net algorithm (Huson & Bryant, 2006). We performed Mantel test (Mantel, 1967) to test for the presence of isolation-by-distance (IBD) using GenAlEx. We ran the analysis with 999 random permutations to evaluate the relative correlation of genetic differentiation and geographical distance between populations.

| Inference of population demography
In order to identify divergence times, patterns of migrations, and population size changes, a coalescent based maximum likelihood analysis estimating parameters of the population demographic model was performed with fastsimcoal2 version 2709 (Excoffier et al., 2021). We assumed two populations, cluster I and cluster II, based on the results of our Structure analysis. Four twopopulation divergence models were constructed (Figure 2). Model 1 (no-migration model) assumed no-migration between populations.
Models 2 (isolation with migration model) assumed continuous migration during from current to T 2 . Model 3 (ancient migration model) assumed ancient migration during from T 1 to T 2 . Model 4 (secondary contact model) assumed recent migration during from current to T 1 .
We also assumed different population size change in different three time periods, from current to T 1 , from T 1 to T 2 and before T 2 .
To make the input data for fastisimcoal2, as we need information that how many SNPs were obtained from sequences of certain lengths, SNPs were re-extracted using populations of stacks with options of -p 2 -r 0.80 --max-obs-het 0.60 --min-maf 0.0. The reextracted data was constructed from 242,107 all sites, 4178 variant sites (SNPs) and 1753 loci (stacks). Two dimensional minor allele site frequency spectrum (2D-mSFS) was calculated from the .vcf file with our own R script (https://github.com/garag eit46/ 2D-msfs-R).
Missing data was compensated by a bootstrapping within the cluster.
The likelihood of each model was maximized from 50 random starting values, 40 expectation-conditional-maximization (ECM) optimization cycles, and 100,000 coalescent simulations.

F I G U R E 2
Comparison of four two-population divergence models. N, effective population size; Nm i-j , number of migrants per generation from cluster j to i (its direction is forward-in-time); T, event time. The period shown in dashed lines assumes migration between clusters.
Mutation rate was set to 5.39 × 10 −8 per site per generation, which was estimated in a woody shrub, Lonicera oblata (7.7 × 10 −8 per site per year × 7 years per generation; Mu et al., 2022). We calculated Akaike's information criterion (AIC) and selected the model with the lowest AIC value as the best model. Goodness of the fitness of the best model was checked by comparison of observed and simulated 2D-mSFS visually. Confidence interval of the best model was calculated by a parametric bootstrapping. We simulated the model by fastsimcoal2 with the maximum likelihood estimate parameter values 100 times and obtained its 2D-mSFS. Using the simulated 2D-mSFS as an input data, parameters of the best model were recalculated with the observed parameter values as a stating value, 15 ECM cycles and 100,000 coalescent simulations. Finally, 95% confidence intervals (CI) were calculated from the obtained parameter values. We assumed a generation time of 20 years for Z.
tyaihyonii based on the fact that its phylogenetically closely related and commonly cultivated taxon, Abelia × grandiflora (Andrè) Rehd., takes 10-20 years to reach its ultimate height (Talhouk et al., 2015). Considering that the growth of Z. tyaihyonii in nature may be considerably slower than in garden conditions, we have maximized the generation time to 20 years. Consequently, twenty years per generation was used to convert an event time from generations ago to years ago. excepted. After preliminary modeling analysis using the remaining bioclimatic variables, the one revealing a flat response curve was also excluded. Therefore, four bioclimatic variables and one lithological data were used in developing distribution models targeting 33-43°N latitude and 123-132°E longitude. To reduce the uncertainty resulting from using the different GCMs, the LGM distribution models corresponding to three GCMs were averaged (Wiens et al., 2009). We use Maxent 3.4.4 (Merow et al., 2013) with batch mode, and it was set with cross-validation, jackknife tests, regularization parameter 2, 20 replicates, cloglog output, random seeds, and 10,000 background points and 2000 iterations.

| Genetic diversity and genetic structure
In total, we obtained 462,669,080 raw reads from the 187 samples

| Inference of population demography
Isolation with the migration model (model 2) was selected as the best model ( Table 2). Goodness of fitness was visually confirmed by plotting observed and simulated 2D-mSFSs ( Figure S2). As the CIs were not duplicated, current effective population size of CLI (N CLI_0 = 5008, 95% CI: 3266-6873) was significantly larger than that of CLII (N CLII_0 = 1766, 95% CI: 1185-2673; Table 3). Both clusters showed significant recent reduction in population size at T 1 (2.6 ka, Note: The best model was shown in bold. Abbreviations: N, effective population size; Nm i-j , number of migrants per generation from cluster j to i (its direction is forward-in-time); T, event time.

TA B L E 3 Maximum likelihood estimate (MLE) and 95%
confidence interval (CI) of parameters in the best two-population divergence model (Model 2 in Figure 2).

Parameter MLE (95% CI)
N CLI_0 5008 ( Abbreviations: CLI and CLII indicate cluster I and cluster II, respectively; N, effective population size; N CLX_0 and N CLX_1 indicate effective population size of cluster X during from current to T 1 and from T 1 to T 2 , respectively; Nm i-j , number of migrants per generation from cluster j to i (direction of migration is forward-in-time); T, event time.

| Ecological niche modeling
Current predicted distribution of Z. tyaihyonii was made by using Maxent modeling and revealed high AUC (0.977). The variable showing the highest percent contribution is glim (lithological data; 52.4%) and second one is bio04 (temperature seasonality; 18.2%). The current predicted distribution shows that it is significantly associated with the limestone area in Korean Peninsula ( Figure 6). The LGM distribution, which was projected and averaged from three GSMs, reveals that there was only a low suitable area (<0.6) for the species, which is also related to the limestone area.

| DISCUSS ION
We used the MIG-seq technique to study the genetic diversity of Z. tyaihyonii, a karst-restricted plant species in South Korea that is under threat. This allowed us to conduct a detailed genetic assessment of this endangered shrub in a small area. Our analysis showed that there are two genetically distinct clusters, CLI and CLII of ancient origin (ca. 490 ka), and we were able to identify the historical and intrinsic factors that shaped their genetic composition. Despite CLII having a more severe bottleneck, both the clusters had similar levels of genetic diversity. This can be attributed to bidirectional historical gene flow that contributed to the genetic diversity of both groups without compromising their genetic uniqueness. We propose a historical distribution scenario for Z. tyaihyonii that considers its intrinsic factors, and highlights a more complex scenario that takes into account its response to Quaternary climatic fluctuations, beyond simple allopatric speciation models.

| Population establishment history and genetic structure
We found that the Z. tyaihyonii populations are structured in two genetically different clusters, as shown in the STRUCTURE, Neighbor-Net phylogeny, and PCoA analyses with a significant IBD pattern.
However, genetic variation in the CJ_S6 population appears to be inconsistent with the geographical proximity of the two clusters in the limestone zones. This population is isolated from other popula- LGM. This result may imply that their current distribution may have been largely unaffected by the Quaternary glacial-interglacial oscillations that contributed to genetic variation and geographic adequacy Tamaki et al., 2018Tamaki et al., , 2021. We suggest that the strong cold tolerance of Z. tyaihyonii made it possible to avoid forcing the southern retreat of the distribution range by climate cooling during glacial periods. Indeed, Z. tyaihyonii (= Abelia mosanensis) can withstand minimum temperatures from −28.9 to −23.3°C, thus it is considered cold hardy (as Abelia mosanensis in Rose, 2013).
It is estimated that temperatures at the peak of the past glacial maximum were about 6 degrees cooler than today (Golledge et al., 2012;Hulton et al., 2002;Seltzer et al., 2021). Considering that the av-

| Genetic diversity and historical gene flow
We detected signs of a recent bottleneck in both CLI and CLII, which can be inferred from the change in the effective population size ( Table 2). However, the degree of bottleneck was significantly stronger in CLII. A bottleneck event is almost universally accompanied by a loss of genetic diversity (e.g., Jacquemyn et al., 2010;Landergott et al., 2001;Zheng et al., 2012), and thus we might expect to observe lower genetic diversity in populations that have experienced a stronger bottleneck. Interestingly, our results show similar levels of genetic diversity between the two clusters at both the population and pooled sample levels. This suggests that there may be intrinsic or other external factors that have helped to maintain genetic diversity in CLII and potentially in this species as a whole.
The one of most plausible factors can be found in symmetric and persistent historical gene flow, as shown in our demographic analysis ( Figure 5). Although the two genetic clusters are genetically distinct, there exists moderate level of historical gene flow. This level of gene flow is likely not sufficient to significantly impact the genetic distinctiveness of each cluster (e.g., Tamaki et al., 2018Tamaki et al., , 2021. Another possibility can be on their flexible reproductive system of Z. tyaihyonii (sexual and asexual reproduction). Due to their flowering time (May), strong floral scent, and floral structure, these shrubs are considered to be active in cross-fertilization, but often give rise to generations through vegetative reproduction (Jeong et al., 2007).

| Conservation implications
Based on full range of analyses, we recommend that conservation strategies for Z. tyaihyonii populations in South Korea should be developed to maintain the unique genetic identity of each cluster.
Additionally, since it is uncertain whether other Z. tyaihyonii popula- Random genetic drift tends to decrease evolutionary adaptation and to fix harmful alleles to populations, making them maladapted to a changing environment (Stetter et al., 2018;Weeks et al., 2016).
Nevertheless, we do not believe that increasing genetic diversity in these populations should be the primary focus of conservation strategies. The current level of genetic diversity in Z. tyaihyonii may be a strategic outcome that has contributed to their long-term survival, potentially due to their flexible outcrossing and vegetative reproductive systems. Moreover, our analysis suggests that the uniform genetic diversity across the entire populations may be a result of historical environmental changes, rather than recent human-induced destruction (e.g., Tamaki et al., 2016). Therefore, we propose that the current level of genetic diversity in these two clusters may be well-suited for their long-term survival.
From a long-term conservation perspective, we highlight that Z.
tyaihyonii populations in South Korea should be conserved and managed in a way that allows for natural migrations between the two clusters. Rather than considering the clusters as completely separate units, a strategy that promotes further increases in population size should be adopted. To prevent further isolation, land development in areas that could serve as corridors or migration routes between the clusters should be avoided. When selecting donor populations for each cluster, it is crucial to focus on genetic suitability rather than geographic proximity, especially for the CJ_S6 population.

ACK N OWLED G M ENTS
We sincerely thank the following researchers for their comments, which greatly improved the previous version of this manuscript: Dr.
S.K. So (Korea National Park Service), Dr. D.P. Jin (Sejong National Arboretum), and Dr. J.W. Park (Chonnam National University). We would also like to express our gratitude to the three anonymous reviewers who have put in efforts to improve the manuscript.